Chapter 1 



1 . A hospital uses an application that stores patient X-ray data in the form of large 
binary objects in an Oracle database. The application is hosted on a UNIX server, 
and the hospital staff accesses the X-ray records through a Gigabit Ethernet 
backbone. An EMC CLARiiON storage array provides storage to the UNIX 
server, which has 6 TB of usable capacity. Explain the core elements of the data 
center. What are the typical challenges the storage management team may face in 
meeting the service-level demands of the hospital staff? Describe how the value 
of this patient data might change over time. 

Solution/Hint: 

Core elements of the data center: 
Application 
Database - oracle 

Server and operating system - UNIX server 

- Network -LAN, SAN 

Storage array - EMC CLARiiON storage array 
Challenges: 

Long term preservation 
High cost 

How patient data might change over time: 

- For first 60 days, the patient data is accessed frequently 

After that the requirement of the patient data is very less, so it could be 
moved to CAS. 

2. An engineering design department of a large company maintains over 600,000 
engineering drawings that its designer's access and reuse in their current projects, 
modifying or updating them as required. The design team wants instant access to 
the drawings for its current projects, but is currently constrained by an 
infrastructure that is not able to scale to meet the response time requirements. The 
team has classified the drawings as "most frequently accessed," "frequently 
accessed," "occasionally accessed," and "archive." 

• Suggest and provide the details for a strategy for the design department 
that optimizes the storage infrastructure by using ILM. 

• Explain how you will use "tiered storage" based on access frequency. 

• Detail the hardware and software components you will need to implement 
your strategy. 

• Research products and solutions currently available to meet the solution 
you are proposing. 

Solution/Hint: 

• Classify the data according to access frequency or value and use tiered 
storage that optimizes the infrastructure cost and performance by using 
ILM. 

• Storage requirement can be classify as: 

frequently used data should be placed in high end storage array 
occasionally accessed should be in low end storage array 
and archived data in specialized CAS system 



• Hardware and software components needed: 

High end and Mid range storage array 
Content addressed storage(CAS) 

- FC SAN, LAN 
Server 

Software: 

- ILM tool 

• Research on following products and solutions (www.emc.com) 

Storage arrays - CLARiiON / Symmetrix 

- CAS - Centera 

- FC SAN - Switches / Directors 
ILM strategy 

3. The marketing department at a mid-size firm is expanding. New hires are being 
added to the department and they are given network access to the department's 
files. IT has given marketing a networked drive on the LAN, but it keeps reaching 
capacity every third week. Current capacity is 500 MB (and growing), with 
hundreds of files. Users are complaining about LAN response times and capacity. 
As the IT manager, what could you recommend to improve the situation? 
Solution/Hint: 

- NAS 

4. A large company is considering a storage infrastructure — one that is scalable and 
provides high availability. More importantly, the company also needs 
performance for its mission-critical applications. Which storage topology would 
you recommend (SAN, NAS, IP SAN) and why? 

Solution/Hint: 

SAN is a recommended solution. 

Because SAN has high scalability and availability (using director or 
switch). 



Chapter 2 



What are the benefits of using multiple HBAs on a host? 
Solution/Hint: 

High availability 

An application specifies a requirement of 200GB to host a database and other 
files. It also specifies that the storage environment should support 5,000 IOPS 
during its peak processing cycle. The disks available for configuration provide 
66GB of usable capacity, and the manufacturer specifies that they can support a 
maximum of 140 IOPS. The application is response time sensitive and disk 
utilization beyond 60 percent will not meet the response time requirements of the 
application. Compute and explain the theoretical basis for the minimum number 
of disks that should be configured to meet the requirements of the application. 
Solution/Hint: 

Number of disk required = max (size requirement, IOPS requirements) 
To meet the size requirement = 200 GB/66 GB= 4 disks 
To meet the IOPS requirement= 5000 IOPS/(140x0.6 IOPS)= 60 disks 
= max (4, 60) = 60 disks 

Which components constitute the disk service time? Which component 
contributes the largest percentage of the disk service time in a random I/O 
operation? 
Solution/Hint: 

seek time, rotational latency and transfer rate 

seek time 

Why do formatted disks have less capacity than unformatted disks? 
Solution/Hint: 

In order to make storage device functional, it need to be formatted. 
Common types of drive formats are FAT32, NTFS and ext2. In each of 
the formatting schemes, a portion of the storage space is allocated to 
configured file system to enable cataloging data on the disk drive. 

The average I/O size of an application is 64 KB. The following specifications are 
available from the disk manufacturer: average seek time = 5 ms, 7,200 rpm, 
transfer rate = 40 MB/s. Determine the maximum IOPS that could be performed 
with the disk for this application. Taking this case as an example, explain the 
relationship between disk utilization and IOPS. 
Solution/Hint: 

The disk service time (Rs) is a key measure of disk performance; and 
Rs along with disk utilization rate (U) determines the I/O response time 
for applications. 

The total disk service time (Rs) is the sum of seek time (E), rotational 
latency (L), and the internal transfer time (X): 
Rs=E+L+X 



E is determined based on the randomness of the I/O request. L and 
X are measures provided by disk vendors as technical 
specifications of the disk. 
Average seek time of 5ms in a random I/O environment, or E=5ms 
Disk rotation speed of 7,200 rpm - from which rotational latency (L) 
can be determined, which is one half of the time taken for a full 
rotation or 

L= (0.5/7,200 rpm expressed in ms) 
40 MB/s internal data transfer rate, from which the internal transfer 
time (X) is derived based on the block size of the I/O. 
With a block size of 64 KB, X= 64 KB/40 MB 
Consequently R s = 5 ms + (0.5/7,200) + 64 KB/40 MB 
= 5+4.167+1.6 
= 10.767 ms 

The maximum no. of I/Os serviced per second or IOPS = 1/R S 
In other words, for an I/O with a block size of 64 KB and 
R s = 10.767 ms, the maximum IOPS will be 
1/(10.767* 10" 3 ) = 92.876 IOPS 

6. Consider a disk I/O system in which an I/O request arrives at the rate of 80 IOPS. 
The disk service time is 6 ms. 

a. Compute the following: Utilization of I/O controller, Total response time, 
Average queue size, and Total time spent by a request in a queue. 

b. Compute the preceding parameter if the service time is halved. 
Solution/Hint: 

Arrival rate (a) = 80 IOPS, consequently, the arrival time 
Ra = 1/a = 1/80 = 12.5 ms 
Rs = 6 ms (given) 

1 . Utilization (U) = Rs/Ra 

= 6/12.5 = 0.48 or 48% 

2 . Response time (R) = Rs/( 1 -U) 

= 6/(1-0.48)= 11.5 ms 

3. Average queue size = U /(1-U) 

= (0.48) 2 /(l-0.48) 
= 0.44 

4. Time spent by a request in a queue = UxR, or the total response time-service 
time = 0.48x1 1.5 = 5. 52 ms 

Now, if controller power is doubled, or the service time is halved; 
consequently Rs = 3 ms in this scenario. 

1 . Utilization (U) = Rs/Ra = 3/12.5 = 0.24 or 24 % 

2. Response time (R) = Rs/(1-U) = 3/(1-0.24) = 3.9 ms 

3. Average queue size = U 2 /(l-U) = (0.24) 2 /( 1-0.24) = 0.08 

4. Time spent by a request in a queue = 0.24x3.9 = 0.936 ms 



7. Refer to Question 6 and plot a graph showing the response time and utilization, 
considering 20 percent, 40 percent, 60 percent, 80 percent, and 100 percent 
utilization of the I/O controller. Describe the conclusion that could be derived 
from the graph. 
Solution/Hint: 
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Chapter 3 

1 . Why is RAID 1 not a substitute for a backup? 
Solution/Hint: 

RAID 1 provides protection against disk failure and not a solution for data 
recovery due to disaster. 

2. Why is RAID not an option for data protection and high availability? 
Solution/Hint: 

It does not provide any data protection. 

3. Explain the process of data recovery in case of a drive failure in RAID 5. 
Solution/Hint: 

It performs XOR operation between remaining disks and regenerate lost 
data. 

4. What are the benefits of using RAID 3 in a backup application? 
Solution/Hint: 

Backup application performs large sequential I/Os and RAID 3 gives best 
result in case of large sequential I/O operation. 

5. Discuss the impact of random and sequential I/O in different RAID 
configurations. 

Solution/Hint: 



RAID 


Random 


Sequential 


3 


Good for random reads and poor to 
fair for small random writes. 


very good for sequential reads and 
good for large, sequential writes. 


5 


Very good for random reads. Fair 
for random write. Slower due to 
parity overhead. 


Good for sequential reads. Fair to 
good for sequential writes. 


1/0 


Very good for random write. 


Poor to fair for sequential writes 



6. A 10K rpm drive is rated to perform 130 IOPS, and a 15K rpm drive is rated to 
perform 180 IOPS for an application. The read/write ratio is 3:1. Compute the 
RAID-adjusted IOPS for the 10K and 15K drives for RAID 1, RAID 5, and RAID 
6. 

Solution/Hint: Question Invalid 



7. An application has 1,000 heavy users at a peak of 2 IOPS each and 2,000 typical 
users at a peak of 1 IOPS each, with a read/write ratio of 2: 1 . It is estimated that 
the application also experiences an overhead of 20 percent for other workloads. 
Calculate the IOPS requirement for RAID 1, RAID 3, RAID 5, and RAID 6. 



Solution/Hint: 



1000 heavy users at a peak of 2 IOPS each = 2000 IOPS 
2000 typical users at a peak of 1 IOPS each = 2000 IOPS 



Assume maximum concurrency 90% 

[(2000 + 2000)*0.9] = 3600 host based IOPS for 3000 users during peak 

activity period 

read / write ratio 2:1 



8. For Question 7, compute the number of drives required to support the application 
in different RAID environments if 10K rpm drives with a rating of 130 IOPS per 
drive were used. 

Solution/Hint: 

Number of drives required to support the application in different RAID 
environments, if 10k rpm drives with a rating of 130 IOPS per drive 



For RAID 1 
For RAID 3 
For RAID 5 
For RAID 6 



3600*2/3 + (2*1/3*3600) 
3600*2/3 + (4*1/3*3600) 
3600*2/3 + (4*1/3*3600) 
3600*2/3 + (6*1/3*3600) 



4800 IOPS 
7200 IOPS 
7200 IOPS 
9600 IOPS 



For RAID 1 = 4800/130 = 37 drives 
For RAID 3 = 7200/130 = 55 drives 
For RAID 5 = 7200/130 = 55 drives 
For RAID 6 = 9600/130 = 74 drives 
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1 . Consider a scenario in which an I/O request from track 1 is followed by an I/O 
request from track 2 on a sector that is 180 degrees away from the first request. A 
third request is from a sector on track 3, which is adjacent to the sector on which 
the first request is made. Discuss the advantages and disadvantages of using the 
command queuing algorithm in this scenario. 

Solution/Hint: 

In this scenario, command queuing provide rotational latency optimization 
where request from track 3 is serviced before request 2 to avoid one 
rotation timing. Assume if request 3 is from the same track 1 in that case 
command queuing provides seek time optimization and improves disk 
performance. 

2. Which application benefits the most by bypassing the write cache and Why? 

Solution/Hint: 

Application with very large size I/O writes. 

3. An Oracle database uses a block size of 4 K for its I/O operation. The application 
that uses this database primarily performs a sequential read operation. Suggest 
and explain the appropriate values for the following cache parameters: cache page 
size, cache allocation (read versus write), pre-fetch type, and write aside cache. 

Solution/Hint: 

Cache page size : 4 K 

Cache allocation (read versus write) : more read cache 

pre-fetch type : fixed pre-fetch 

Write aside cache : Large value 
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1 . DAS provides an economically viable alternative to other storage networking 
solutions. Justify this statement. 

Solution/Hint: 

Setup requires a relatively lower initial investment 

Setup is managed using host-based tools, such as the host OS, which makes 
storage management tasks easy for small and medium enterprises. 
Requires fewer management tasks, and less hardware and software elements 
to set up and operate. 

2. How is the priority sequence established in a wide SCSI environment? 
Solution/Hint: 

In a wide SCSI, the device IDs from 8 to 15 have the highest priority, but the 
entire sequence of wide SCSI IDs has lower priority than narrow SCSI IDs. 
Therefore, the overall priority sequence for a wide SCSI is 7, 6, 5, 4, 3, 2, 1, 0, 
15, 14, 13, 12, 11, 10, 9, and 8. 

3. Why is SCSI performance superior to that of IDE/ AT A? Explain the reasons from 
an architectural perspective. 

Solution/Hint: 

SCSI offers improved performance and expandability and compatibility 
options, making it suitable for high-end computers. 
- Number of devices supported is 16 

SCSI architecture derives its base from the client-server relationship 
SCSI initiator, or a client, sends a request to a SCSI target, or a server. 
The target performs the tasks requested and sends the output to the initiator 
When a device is initialized, SCSI allows for automatic assignment of device 
IDs on the bus, which prevents two or more devices using the same SCSI IDs. 

4. Research blade server architecture and discuss the limitations of DAS for this 
architecture. 

Solution/Hint: - DAS Limitations 

DAS does not scale well, has a limited number of ports 

A limited bandwidth in DAS restricts the available I/O processing capability. 

The distance limitations associated with DAS 

Inability to share the resources, unused resources cannot be easily re- 
allocated, resulting in islands of over-utilized and under-utilized storage pools. 

5. What would you consider while choosing serial or parallel data transfer in a DAS 
implementation? Explain your answer and justify your choice. 

Solution/Hint: 

Distance and speed are the key factors 
Serial data transfer can overcome the distance limitations 
Parallel data transfer can overcome the seed limitations 
Parallel data transfer is suitable for Internal DAS 
Serial data transfer is suitable for External DAS 



6. If three hard disk drives are connected in a daisy chain and communicate over 

SCSI, explain how the CPU will perform I/O operations with a particular device. 
Solution/Hint: 

Each disk will be identified by cn|tn|dn number 

As the three disks are connected in a daisy chain cn and tn number will be 
the same for all the disks 

SCSI commands/response will be used for the communication 
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1 . What is zoning? Discuss a scenario, 

(i) where soft zoning is preferred over hard zoning. 

(ii) where hard zoning is preferred over soft zoning. 
Solution/Hint: 

Zoning is an FC switch function that enables nodes within the fabric to be 
logically segmented into groups that can communicate with each other. A zone 
consists of selected devices, such as host bus adapters (HBAs) and storage 
devices, in the fabric. Devices assigned to one zone can communicate with other 
devices in the same zone, but not with devices in zones of which they are not 
members. This zoning practice provides a fast, efficient, and reliable means of 
controlling the HBA discovery/login process. Without zoning, the HBA will 
attempt to log in to all ports on the fabric during discovery and during the HBA's 
response to a state change notification. With zoning, the time and Fibre Channel 
bandwidth required to process discovery and the state change notification are 
minimized. 

(i) Soft zoning is also called WWN zoning and it is preferred when user need 
flexibility to physically move attached nodes between switch portsre cable the 
SAN, that may take place during switch maintenance and repair without 
reconfiguring the zone information. This is possible because the WWN is static to 
the node port. 

(ii) Hard zoning is also called port zoning, it is convenient when there is a need 
for hardware replacement as WWN is uniquely associated with a hardware. 

2. Describe the process of assigning FC address to a node when logging in to the 
network for the first time. 

Solution/Hint:: 

To log on to the fabric, a node sends a FLOGI frame with the World Wide Node 
Name (WWNN) and World Wide Port Name (WWPN) parameters to the login 
service at the well-known FC address FFFFFE. In turn, the switch accepts the 
login and returns an Accept (ACC) frame with the assigned FC address for the 
device. Immediately after the FLOGI, the N_port registers itself with the local 
name server on the switch, indicating its WWNN, WWPN, and assigned FC 
address. 

3. Seventeen switches, with 16 ports each, are connected in a mesh topology. How 
many ports are available for host and storage connectivity if you create a high- 
availability solution? 

Solution/Hint: 

- Total ports = 17 * 16 = 272 

- Number of ISL 136 

Each ISL consumes 2 ports 

number of ports available for hosts 

4. Discuss the advantage of FC-SW over FC-AL. 



Solution/Hint:: 

Unlike a loop configuration, a Fibre Channel switched fabric (FC-SW) network 
provides interconnected devices, dedicated bandwidth, and scalability. The addition 
or removal of a device in a switched fabric is minimally disruptive; it does not affect 
the ongoing traffic between other devices (as compared to FC-AL). In FC-SW, nodes 
do not share a loop; instead, data is transferred through a dedicated path between the 
nodes. FC uses 24-bit fibre channel addressing scheme for connectivity which can 
support up to 15 million devices (FC-AL uses 8-bit addressing which can support up 
to 127 devices on a loop) 

5. How flow control works in FC network. 
Solution/Hint:: 

Flow control is used to define and regulate the pace of flow of data frames between 
sender and receiver during data transmission. FC technology uses two flow-control 
mechanisms: buffer-to-buffer credit (BBCredit) and end-to-end credit (EE Credit). 

BB_Credit:The process is as follows 

1 . At login, the transmitter and the receiver exchange parameters and establish the 
BB Credit value. 

2. The transmitter's count initializes to the BB Credit value. 

3. If (BB Credit > 0), transmitter sends a frame, and decrements the count per 
transmitted frame. 

4. An RRDY (Receiver Ready) is sent from the receiving port to the transmitting 
port for every available buffer on the receiving side. 

5. The transmitter increments the count by 1 for each R RDY it receives from the 
receiver. The transmitting port maintains a count of free receiver buffers. 

5. Upon a link reset, the BB Credit value resets to the value negotiated upon login. 

EE_Credit 

The function of end-to-end credit, known as EE Credit, is similar to that of 
BB_Credit.(The EE Credit mechanism is used for the flow control for class 1 and 
class 2 traffic only). 

6. Why is class 3 service most preferred for FC communication? 
Solution/Hint: 

Non-dedicated connection 
- BBcredit 

High bandwidth utilization 
Support for multiplexing 
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1 . List and explain the considerations for capacity design for both CPU and storage 
in a NAS environment. 

Solution/Hint: 

The scalability of capacity can be a constraint in an integrated NAS. However, 
the gateway NAS can scale for the required connectivity and storage. 
Gateway NAS shares CPU load with SAN workload and may face constraint 
on CPU utilization. Typically, CPU capacity is not the major bottleneck factor 
in a NAS system, but other considerations such as memory requirement, 
number of network ports, file system, IOPS requirements, along with storage 
capacity requirements are important factor. 

2. SAN is configured for backup to a disk environment, and the storage 
configuration has additional capacity available. Can you have a NAS gateway 
configuration use this SAN? Discuss the implications of sharing the backup-to- 
disk SAN environment with NAS. 

Solution/Hint: 

As the additional capacity is available gateway NAS can be implemented. 
But during the backup window it will have considerable performance impact 
on Network and hence NAS. 

3. Explain how the performance of NAS can be affected if the TCP window size at 
the sender and the receiver are not synchronized. 

Solution/Hint: 

This will affect the NAS performance as this may lead to re-transmission of 
data, lower bandwidth utilization, performance degradation of the network, 
intermittent connectivity, and data link errors. 

4. Research the use of baby jumbo frames and how it affects NAS performance. 
Solution/Hint: 

Common Ethernet Jumbo frame size is 9000 Bytes 
Baby jumbo frames size is 2.5KB Bytes 

5. Research the file sharing features of the NFS protocol. 

Hint: Research question (refer to NFS protocol section in the book) 

6. A NAS implementation configured jumbo frames on the NAS head, with 9,000 as 
its MTU. However, the implementers did not see any performance improvement 
and actually experienced performance degradation. What could be the cause? 
Research the end-to-end jumbo frame support requirements in a network. 

Solution/Hint: 

Jumbo frames are used at the end point (NAS Head) with MTU of 9000 
Check if the intermediate network uses a different MTU size (e.g. 1500 ). 
This can cause the router to drop packets which then have to re-transmit at the 
TCP layer. 



Packets are then fragmented and have to reassemble to accommodate the 
different MTU sizes. This degrades network performance. 

7. Acme Corporation is trying to decide between an integrated or a gateway NAS 
solution. The existing SAN at Acme will provide capacity and scalability. The IT 
department is considering a NAS solution for the Training department at Acme 
for training videos. The videos would only be used by the training department for 
evaluation of instructors. Pick a NAS solution and explain the reasons for your 
choice. 
Solution/Hint: 

Existing SAN; so the choice will be gateway NAS 
Check if SAN has additional capacity available. 
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1 . How does iSCSI handle the process of authentication? Research the available 
options. 

Solution/Hint:: 

CHAP (Challenge handshake authentication protocol) 

2. List some of the data storage applications that could benefit from an IP SAN 
solution. 

Solution/Hint: 

Extending reach of existing SAN 
Disaster recovery solutions 
Remote office applications 

3. What are the major performance considerations for FCIP? 
Solution/Hint: 

Refer section 8.2.2 FCIP Performance and security 

4. Research the multipathing software available for an iSCSI environment. Write a 
technical note on the features and functionality of EMC PowerPath support for 
iSCSI. 

Hint: Research work 

5. Research the iSCSI capabilities in a NAS device; provide use case examples. 
Hint: Research work 

6. A company is considering implementing storage. They do not have a current 
storage infrastructure to use, but they have a network that gives them good 
performance. Discuss whether native or bridged iSCSI should be used and explain 
your recommendation. 

Solution/Hint: 

Native iSCSI implementation doesn't involve any FC components while 

Bridged iSCSI implementation involves FC components 

As the company doesn't have storage infrastructure and the network they have 

gives them a good performance native iSCSI should be deployed. 

iSCSI enabled Storage is only needed if native iSCSI is to be implemented in 

this case. 

7. The IP bandwidth provided for FCIP connectivity seems to be constrained. 
Discuss its implications if the SANs that are merged are fairly large, with 500 
ports on each side, and the SANs at both ends are constantly reconfigured. 

Solution/Hint: 

As bandwidth is insufficient IP network will become the bottleneck 
As the fabrics on both sides are fairly large and are constantly reconfigured 
any disruption in the IP network will lead to instabilities in unified fabric. 
■ These include a segmented fabric, excessive RSCNs, and host 
timeouts. 



The solution can be to segregate FCIP traffic into a separate virtual fabric, to 
provide additional stability, 



8. Compared to a standard IP frame, what percentage of reduction can be realized in 
protocol overhead in an iSCSI configured to use jumbo frames with an MTU 
value of 9,000? 

Solution/Hint: 

iSCSI PDU size =1460 bytes (contain payload and additional header segment) 
Jumbo frame size of 9000MTU out of which payload is 8960 
Jumbo Frames allows a significant amount of increased payload to be 
delivered in each iSCSI PDU. 

9. Why should an MTU value of at least 2,500 be configured in a bridged iSCSI 
environment? 

Solution/Hint: 

FC supports frame size of 2148 byte 
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1 . Explain how a CAS solution fits into the ILM strategy. 
Solution/Hint: 

According to ILM strategy value of information changes over its lifecycle, when 
created value of information is very high and it is frequently accessed and 
changed, hence placed in a high performance costly storage. With the time its 
value drops and it becomes fixed content which is rarely accessed, but still holds 
place in costly storage space. For the cost optimization less accessed data should 
be moved to archived and leave the costly space for high value data. CAS is a 
solution for archived data, which not only provide cost benefit but also provide 
faster access and reliable storage to fixed content. 

2. To access data in a SAN, a host uses a physical address known as a logical block 
address (LBA). A host using a CAS device does not use (or need) a physical 
address. Why? 

Solution/Hint: 

Unlike file-level and block-level data access that use file names and the 
physical location of data for storage and retrieval, CAS stores data and its 
attributes as an object. The stored object is assigned a globally unique 
address known as a content address which is derived from the actual binary 
representation of stored data. 

3. The IT department of a departmental store uses tape to archive data. Explain 4-5 
major points you could provide to persuade the IT department to move to a CAS 
solution. How would your suggestions impact the IT department? 

Solution/Hint: 

Guaranteed Content Authenticity and Integrity: Data can not be manipulated 

once stored, meet regulatory and business compliance. 

Single Instance Storage: Simplifies storage resource management, especially 

when handling large amount of fixed content. 

Faster Data Retrieval: Compared to tape 

Technology independence: As long as the application server is able to map the 
original content address the data remains accessible. 

Better data protection and disposition: All fixed content is stored in CAS once 
and is backed up with a protection scheme. 
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1 . What do VLANs virtualize? Discuss VLAN implementation as a virtualization 
technology. 

Solution/Hint: 

VLAN stand for virtual LAN which has same attributes as of physical LAN, but allows 
hosts to be grouped together even if they are not located on the same network switch. 
With the use of network reconfiguration software, ports on the layer 2 switch can be 
logically grouped together, forming a separate, Virtual Local Area Network. VLANs help 
to simplify network administration. Ports in a VLAN can be limited to only the number 
needed for a particular network. This allows unused ports to be used in other VLANs. 
Through software commands, additional ports can be added to an existing VLAN if 
further expansion is needed. If a machine needs to be moved to a different IP network, 
the port is reassigned to a different VLAN and there is no need for the physical 
movement of cables. 

3. How can a block-level virtualization implementation be used as a data migration tool? 
Explain how data migration will be accomplished and discuss the advantages of using 
this method for storage. Compare this method to traditional migration methods. 

Solution/Hint: 

Conventionally data migration needs physical remapping of servers to new storage 
location which resulted in application downtime and physical changes. In a virtualized 
environment virtual volumes are assigned to the host out of physical pool of storage 
capacity. Data migration is achieved through these virtual volumes. To move a virtual 
volume, virtualization software performs a redirection of LO from one physical location 
to another. Despite the fact that the LO is physically redirected to a new location by the 
virtualization software, the address of the virtual volume presented to the host never 
changes. This is accomplished through virtual addressing. This allows the process to be 
transparent and non disruptive to the host. Additionally, since the copying and remapping 
is done by the virtualization system, no host cycle are required, freeing servers to be 
dedicated to their proper application centric function. 
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1 . A network router has a failure rate of 0.02 percent per 1 ,000 hours. What is the 
MTBF of that component? 

Solution/Hint: 

MTBF of network router = 1/Failure rate 

= 100*1000/0.02 
= 50, 00,000 hrs 

2. The IT department of a bank promises customer access to the bank rate table 
between 9:00 a.m. and 4:00 p.m. from Monday to Friday. It updates the table 
every day at 8:00 a.m. with a feed from the mainframe system. The update 
process takes 35 minutes to complete. On Thursday, due to a database corruption, 
the rate table could not be updated, and at 9:05 a.m., it was established that the 
table had errors. A rerun of the update was done, and the table was recreated at 
9:45 a.m. Verification was run for 15 minutes, and the rate table became available 
to the bank branches. What was the availability of the rate table for the week in 
which this incident took place, assuming there were no other issues? 

Solution/Hint: 

Availability = total uptime/total scheduled time 
Total scheduled time = 7 hrs * 5 = 35 hrs 

Total up time = 34 hrs (as on Thursday rate table was made available at 10:00 am 
instead of 9:00 am) 

Therefore, availability of the rate table for the week = 34/35 

3. 'Availability is expressed in terms of 9s." Explain the relevance of the use of 9s 
for availability, using examples. 

Solution/Hint: 

Uptime per year is based on the exact timeliness requirements of the 

service, this calculation leads to the number of "9s" representation for 
availability metrics. 

For example, a service that is said to be "five 9s available" is available for 
99.999 percent of the scheduled time in a year (24x 7 x 365). 

Uptime (%) Downtime (%) Downtime per Downtime per 

Year Week 

99.999 0.001 5.25 minutes 6 sec 

4. Provide examples of planned and unplanned downtime in the context of data 
center operations. 

Solution/Hint: 

Examples of planned downtime: installation /integration /maintenance of 
new hardware, software upgrades or patches, taking backups, application 



and data restores, facility operations (renovation and construction), 
refresh/ migration of testing environment to the production data 

- Examples of unplanned downtime: failure caused by database corruption, 
component failure, human errors 

5. How does clustering help to minimize RTO? 
Solution/Hint: 

RTO of 1 hour: Cluster production servers with controller-based disk 
mirroring. 

RTO of a few seconds: Cluster production servers with bidirectional 
mirroring, enabling the applications to run at both sites simultaneously. 

6. How is the choice of a recovery site strategy (cold and hot) determined in relation 
to RTO and RPO? 

Solution/Hint: 

- RTO and RPO - small - hot site 

- RTO and RPO - large - cold site 

7. Assume the storage configuration design shown in the following figure: 



Perform the single point of failure analysis for this configuration and provide an 
alternate configuration that eliminates all single points of failure. 

Solution/Hint: 

Single point of failure: host, switch, storage array, HBA, array port and 
path 

Alternate configuration as shown below to avoid SPF 




Host 



Storage 
Array 
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Redundant FC Switcher I 
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1. A manufacturing corporation uses tape as its primary backup storage media 
throughout the organization: 

• Full backups are performed every Sunday. 

• Incremental backups are performed Monday through Saturday. 

• The environment contains many backup servers, backing up different 
groups of servers. 

• The e-mail and database applications have to be shut down during the 
backup process. 

Due to the decentralized backup environment, recover-ability is often 
compromised. There are too many tapes that need to be mounted to perform a full 
recover in case of a complete failure. The time needed to recover is too lengthy. 
The company would like to deploy an easy-to-manage backup environment. They 
want to reduce the amount of time the e-mail and database applications are 
unavailable, and reduce the number of tapes required to fully recover a server in 
case of failure. 

Propose a backup and recovery solution to address the company's needs. Justify 
how your solution ensures that their requirements will be met. 

Solution/Hint: 

The solution should have the following elements: 

• Centralized backup server 

• Backup agents to avoid the requirement for critical applications to be 
shutdown during the backup process. 

• Use of a cumulative backup policy instead of incremental backups, 
reducing the amount of tape required for a full restore. 

2. There are limited backup devices in a file sharing NAS environment. Suggest a 
suitable backup implementation that will minimize the network traffic, avoid any 
congestion, and at the same time not impact the production operations. Justify 
your answer. 

Solution/Hint: 

This is achieved by the introduction of NDMP, to promote data transport between 
NAS and backup devices. Due to its flexibility it is no longer necessary to 
transport the data through the backup server. Data is sent from the filer directly to 
the backup device, while metadata is sent to the backup server for tracking 
purposes. This solution meets the strategic need to centrally manage and control 
distributed data, while minimizing network traffic. NDMP 3 -way is useful when 
there are limited backup devices in the environment, enabling the NAS device 
controlling the backup device to share it with other NAS devices, by receiving 
backup data via NDMP. 



3. Discuss the security concerns in backup environment. 



Solution/Hint: 

Major security concern in backup environment is spoofing backup server, backup client or 
backup node identity by unauthorized host, to gain access to backup data. Another concern is 
backup tape being lost, stolen, or misplaced, especially if the tapes contain highly confidential 
information. Backup-to-tape applications are also vulnerable to security implications if they 
do not encrypt data while backing up. Lastly backup data shredding should also 
consider, by performing safe tape data erasure or overwriting if they no longer 
required. 

4. What are the various business/technical considerations for implementing a backup 
solution, and how do these considerations impact the backup solution/ 

imp lementation? 

Solution/Hint: 

-> RTO and RPO are the primary considerations in selecting and implementing a 

-> specific backup strategy. 

-> Retention period 

-> Backup media type 

-> Backup granularity 

-> Time for performing backup and available backup window 

-> Location and time of the restore operation 

-> file characteristics (location, size, and number of files) and data compression 

5. What is the purpose of performing operation backup, disaster recovery, 
and archiving? 

Solution/Hint: 

Operation backup: To restore data in the event of data loss or logical corruptions 
Disaster recovery: For restoring data at an alternate site when the primary site is 

incapacitated due to a disaster. 
Archiving: For long term data retention (regulatory compliance or business 

requirement) 

6. List and explain the considerations in using tape as the backup technology. What 
are the challenges in this environment? 

Solution/Hint: 

Advantages: 

- Offsite data copy 

- Lower initial cost 
Challenges: 

- Reliability 

- Restore performance (mount, load to ready, rewind, dismount times) 

- Sequential Access 

- HVAC controlled environment 

- Shipping / handling challenges 



7. Describe the benefits of using "virtual tape library" over "physical tapes." 



Features 


Tape 


Virtual Tape 


Offsite Capabilities 


Yes 


Yes 


Reliability 


No inherent 
protection methods 


RAID, spare 


Performance 


Subject to mechanical 
operations, load times 


Faster single stream 


Use 


Backup only 


Backup only 



Chapter 13 



1 . What is the importance of recoverability and consistency in local replication? 
Solution/Hint: 

Recoverability enables restoration of data from the replicas to the production 
volumes in the event of data loss or data corruption. 
Recoverability must provide minimal RPO and RTO for resuming 
business operations on the production volumes. 

Consistency ensures the restart ability from data. Business operation can not 
resume from inconsistent data. 

2. Describe the uses of a local replica in various business operations. 

Solution/Hint: 

- Alternate source for backup 
Fast recovery 

Decision support activities such as reporting 
Testing platform 
Data migration 

3. What are the considerations for performing backup from a local replica? 

Solution/Hint: 

The replica should be consistent PIT copy of the source 
Replica should not be updated when the backup window is open 

4. What is the difference between a restore operation and a resynchronization 
operation with local replicas? Explain with examples. 

Solution/Hint: 

Restore operation 

Source is synchronized with the target data 

For example, if source contains a database where a logical data corruption 
occurs, the data can be recovered by attaching the latest PIT replica of the 
source and making incremental restore operation. 
Resynchronization operation 

Target is synchronized with the source data 

For example, after target is detached from the source, both source and 
target data are updated by the host. After sometime the target needs to be 
synchronized with the source data. For that, target is again attached to the 
source and incremental resynchronization is performed. 

5. A 300 GB database needs two local replicas for reporting and backup. There are 
constraints in provisioning full capacity for the replicas. It has been determined 
that the database has been configured on 15 disks, and the daily rate of change in 
the database is approximately 25 percent. You need to configure two pointer- 



based replicas for the database. Describe how much capacity you would allocate 
for these replicas and how many save volumes you would configure. 

Solution/Hint: 

75GB of save volumes are required 

space/capacity is allocated, since it is pointer based replica 

6. For the same database described in Question 5, discuss the advantages of 
configuring full-volume mirroring if there are no constraints on capacity. 

Solution/Hint: 

In full volume mirroring, the source need not be up/healthy for recovery. 

7. An administrator configures six snapshots of a LUN and creates eight clones of 
the same LUN. The administrator then creates four snapshots for each clone that 
was created. How many usable replicas are now available? 

Solution/Hint: 

- Usable replicas = 6 + 8 + 32 = 46 

8. Refer to Question 5. Having created the two replicas for backup and reporting 
purposes, assume you are required to automate the processes of backup and 
reporting from the replicas by using a script. Develop a script in a pseudo 
language (you can use the standard Time Finder commands for the operations you 
need to perform) that will fully automate backup and reporting. Your script 
should perform all types of validations at each step (e.g., validating whether a 
synchronization process is complete or a volume mount is successfully done). 

Solution/Hint: 



Create a flow chart in simple language. 
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1 . An organization is planning a data center migration. They can only afford a 
maximum of two hours downtime to complete the migration. Explain how remote 
replication technology can be used to meet the downtime requirements. Why will 
the other methods not meet this requirement? 

Solution/Hint: 

SAN based remote replication technology can be used to avoid the 
downtime as it provide non-disruptive data migration. 
Conventional methods need downtime to migrate data from one location 
to other. 

2. Explain the RPO that can be achieved with synchronous, asynchronous, and disk- 
buffered remote replication. 

Solution/Hint: 

RPO that can be achieved with synchronous - of the order of Seconds 
RPO that can be achieved with asynchronous - of the order of Minutes 
RPO that can be achieved with Disk buffered remote replication - of the 

order of hours 

3. Discuss the effects of a bunker failure in a three-site replication for the following 
implementation: 

• Multihop — synchronous + disk buffered 

• Multihop — synchronous + asynchronous 

• Multitarget 

Solution/Hint: 

Multihop - synchronous + disk buffered 

Same as synchronous + asynchronous 

Multihop - synchronous + asynchronous 

If there is a disaster at the bunker site or if there is a network link failure 
between the source and bunker sites, the source site will continue to 
operate as normal but without any remote replication. This situation is 
very similar to two-site replication when a failure/disaster occurs at the 
target site. The updates to the remote site cannot occur due to the failure in 
the bunker site. Hence, the data at the remote site keeps falling behind; but 
the advantage here is that if the source fails as well during this time, 
operations can be resumed at the remote site. RPO at the remote site 
depends on the time difference between the bunker site failure and source 
site failure. 

Multitarget 

A failure of the bunker or the remote site is not considered a disaster 
because normal operations can continue at the source site while remote 
disaster recovery protection is still available with the site that has not 
failed. A network link failure to either the bunker site (target 1) or the 



remote site (target 2) enables business operations to continue 
uninterrupted at the source site while remote disaster recovery protection 
is still available with the site that can be reached. 

4. Discuss the effects of a source failure in a three-site replication for the following 
implementation, and the available recovery options: 

• Multihop — synchronous + disk buffered 

• Multihop — synchronous + asynchronous 

• Multitarget 

Solution/Hint: 

Multihop - synchronous + disk buffered 

Same as synchronous + asynchronous 

Multihop - synchronous + asynchronous 

If there is a disaster at the source, operations are failed over to the bunker 
site with zero or near-zero data loss. But unlike the synchronous two-site 
situation, there is still remote protection at the third site. The RPO between 
the bunker and third site could be on the order of minutes. 

Multitarget 

If a source site disaster occurs, BC operations can be started with the 
bunker (target 1) or the remote site (target 2). Under normal 
circumstances, the data at the bunker site is the more recent and up-to- 
date. Hence, operations are resumed with the bunker site data. In some 
circumstances, the data on the remote site is more current than the data on 
the bunker site — for example, if the network links between the source and 
bunker sites has failed. In this case, the workload would continue at the 
source site with just the asynchronous replication to the remote site. If the 
synchronous links are down long enough, then the data at the remote site 
would be more current than the data at the bunker site. If a source site 
disaster occurs at this time, the data on the remote site should be used to 
recover. The network links between the bunker and remote sites are 
activated in this situation to perform incremental synchronization. The 
RPO is near zero if the bunker site data is used, and it is in minutes if the 
remote site data is used. 

5. A host generates 8,000 I/Os at peak utilization with an average I/O size of 32 KB. 
The response time is currently measured at an average of 12 ms during peak 
utilizations. When synchronous replication is implemented with a Fibre Channel 
link to a remote site, what is the response time experienced by the host if the 
network latency is 6 ms per I/O? 

Solution/Hint: 

Actual response time = 12+ (6*4) + (32*1024/8000) = 40.096 
Where 12 ms = current response time 
6 ms per I/O = latency 
32*1024/8000 = data transfer time 
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1 . Research the following security protocols and explain how they are used: 
Hint: Research work 

2. A storage array dials a support center automatically whenever an error is detected. 
The vendor's representative at the support center can log on to the service 
processor of the storage array through the Internet to perform diagnostics and 
repair. Discuss the impact of this feature in a secure storage environment and 
provide security methods that can be implemented to mitigate any malicious 
attacks through this gateway. 

Solution/Hint: 

Modification attacks 

In a modification attack, the unauthorized user attempts to modify 
information for malicious purposes. A modification attack can target data 
at rest or data in transit. These attacks pose a threat to data integrity. 
Denial of Service 

Denial of Service (DoS) attacks denies the use of resources to 
legitimate users. These attacks generally do not involve access to or 
modification of information on the computer system. Instead, they pose a 
threat to data availability. The intentional flooding of a network or website 
to prevent legitimate access to authorized users is one example of a DoS 
attack. 

Eavesdropping 

When someone overhears a conversation, the unauthorized access 
is called Eavesdropping. 
Snooping 

This refers to accessing another user' s data in an unauthorized 
way. In general, snooping and eavesdropping are synonymous. 
Management access 

Management access, whether monitoring, provisioning, or managing 
storage resources, is associated with every device within the storage 
network. Most management software supports some form of CLI, system 
management console or a web-based interface. 
•S Controlling administrative access 

Controlling administrative access to storage aims to 
safeguard against the threats of an attacker spoofing an 
administrator's identity or elevating another user's identity and 
privileges to gain administrative access. Both of these threats 
affect the integrity of data and devices. To protect against these 
threats, administrative access regulation and various auditing 
techniques are used to enforce accountability. 
■f Protecting the management infrastructure 

Protecting the management network infrastructure is also 
necessary. Controls to protect the management network 



infrastructure include encrypting management traffic, enforcing 
management access controls, and applying IP network security 
best practices. These best practices include the use of IP routers 
and Ethernet switches to restrict traffic to certain devices and 
management protocols. 

3. Develop a checklist for auditing the security of a storage environment with SAN, 
NAS, and iSCSI implementations. Explain how you will perform the audit. 
Assume that you discover at least five security loopholes during the audit process. 
List them and provide control mechanisms that should be implemented to 
eliminate them. 



Solution/Hint: 



SAN, NAS, iSCSI 



• Servers (Production, management, backup, third party, NAS) 

o What data or object was accessed /attempted to access? 

o What action was performed? 

o When was executed? 

o Who authorized and performed the action? 

o NFS/CIFS access (shared files) 

• Fabric/ IP network 

o Physical and logical access 

• Switches 

o Physical and logical access 
o Zoning 

• Storage 

o Which volume was accessed /attempted to access? 

o What action was performed? 

o When was executed? 

o Who authorized and performed the action? 

o LUN masking 

o Provisioning 

o Upgrade/replacement 

o Handling of physical media 



Process 



• Collect log and correlate 

• Analyze access and change control 

o Production and DR site 
o Backup and replication 
o Third party service 

• Check alerting mechanism 

• Check security controls 



o Physical 

o Administrative 

o Technological 

• Identify security gap 

• Documentation and recommendation 



Five security loopholes 



1 . Authentication allows multiple login 

2. No firewall 

3. No authentication at the switch level 

4. No encryption for in-flight data 

5. Poor physical security at the data center 

Control 



1 . Restriction in number of login attempt, two part password 

2. Implement firewall to block inappropriate or dangerous traffic 

3. Authenticate users/administrators of FC switches using RADIUS (Remote 
Authentication Dial In User Service), DH-CHAP (Diffie-Hellman Challenge 
Handshake Authentication Protocol), etc. 

4. Encrypting the traffic in transit 

5. Increase security manpower and implement biometric security 
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1 . Download EMC ControlCenter simulator and the accompanying lab guide from 
http://education.emc.com/ismbook and execute the steps detailed in the lab guide. 

Lab exercise 

2. A performance problem has been reported on a database. Monitoring confirms 
that at 12:00 a.m., a problem surfaced, and access to the database is severely 
affected until 3:00 p.m. every day. This time slot is critical for business operations 
and an investigation has been launched. A reporting process that starts at 12:00 
p.m. contends for database resources and constrains the environment. What 
monitoring and management procedures, tools, and alerts would you establish to 
ensure accessibility, capacity, performance, and security in this environment? 
Hints: 

Monitoring: 

Setting up monitoring and reporting for accessibility, capacity, 
performance and security on production and replication data 
Monitoring and management tools such as ECC Performance manager 
need to be deploy to gather all performance statistics data (historical data) 
Performance analysis - performance constraint is because of the resource 

Management: 

Requirement: Database need to be replicate for reporting process 

S Based on requirement and infrastructure chosen replication 

software need to be deploy 
•S Provision storage capacity for replication 
•f Configure the environment for accessing replicated data (need 

configuration at host, network and storage) 
•S Configure adequate capacity based on policy on data retention 

and change 
S Configure security for replicated data 

3. Research SMI-S and write a technical paper on different vendor implementations 
of storage management solutions that comply with SMI-S. 

Research work 



